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For Hermitian positive definite linear systems and eigenvalue problems, the eigCG algorithm is a 
memory efficient algorithm that solves the linear system and simultaneously computes some of its 
eigenvalues. The algorithm is based on the Conjugate-Gradient (CG) algorithm, however, it uses 
only a window of the vectors generated by the CG algorithm to compute approximate eigenvalues. 
The number and accuracy of the eigenvectors can be increased by solving more right-hand sides. 
For Hermitian systems with multiple right-hand sides, the computed eigenvectors can be used to 
speed up the solution of subsequent systems. The algorithm was tested on Lattice QCD problems 
by solving the normal equations and was shown to give large speed up factors and to remove 
the critical slowing down as we approach light quark masses. Here, an extension to the non- 
symmetric case based on the two-sided Lanczos algorithm is given. The new algorithm is tested 
on Lattice QCD problems and is shown to give very promising results. We also study the removal 
of the critical slowing down and compare results with those of the eigCG algorithm. We also 
discuss the case when the system is 75 -Hermitian. 
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1. Introduction 

Computation of various hadronic properties from Lattice QCD requires evaluation of the con- 
tribution of disconnected quark loops. This include, for example, the mass of the neutral pion, the 
spectrum of Isospin singlet mesons [[lj], and the contribution of strange sea quarks to the electro- 
magnetic form factors of the proton Evaluation of the contribution of disconnected quark 
loops requires knowledge of the quark propagator from all sites to all sites on the lattice (all-to-all 
propagators) |Q, ||, 01. In all-to-all propagator methods, one is required to compute the action of the 
inverse of the lattice Dirac operator A on a particular set of sources bi, i = 1,2, . . . ,N r , by solving 
the linear systems, 

Axi = bi, i=\,2,...,N r . (1.1) 

Typical values of N r are <^(50 — 100) and large values of N r are required for smaller statistical 
noise errors. In addition, for small quark masses, solving Eqs.|l.l| using standard iterative methods, 
such as GMRES or BiCGStab, converges very slowly (critical slowing down phenomena). It has 
been realized that critical slowing down could be removed by computing and deflating the lowest 
eigenmodes of A (see ^ for a review of deflation methods in lattice QCD). 

In ^ we have given the Incremental eigCG algorithm for Hermitian, positive definite sys- 
tems. The eigCG part of the algorithm solves a single system using the Conjugate Gradient (CG) 
algorithm and simultaneously computes few eigenvectors with smallest eigenvalues. EigCG uses 
only a small size window of the CG residuals for computing eigenvalues. In addition, the standard 
CG part of eigCG for solving the linear system is totally unaffected by the computation of the 
eigenvalues. For multiple right-hand sides, Incremental eigCG solves a small subset of the linear 
systems using eigCG and concurrently accumulates more eigenvectors as desired. The remaining 
systems are then solved with CG after deflating the computed eigenvectors from the initial guesses. 
Incremental eigCG was tested on large Lattice QCD problems [§§] with very small quark masses 
and was found to remove the critical slowing down as well as speed up the solution for multiple 
right-hand sides through deflation. Since the Dirac matrix A is non-Hermitian, it was necessary to 
apply Incremental eigCG to the normal equations, 

A*Axi=A%, i = 1,2,..., N r . (1.2) 

In this report, we present and test an extension of the ideas of eigCG and its incremental version to 
the non-Hermitian case. There are three motivations for studying this extension. First, converting 



the non-Hermitian system |L1] into the Hermitian, positive definite system |L2| leads to a more 
difficult system as the new system will have a worse condition number. Second, solving the non- 
Hermitian system will give eigenvalues of A directly which could be useful for other applications. 
Finally, one would like to compare the efficiency of removing the critical slowing down when 



solving the systems |L1| and |L2|. In the extension to non-Hermitian case, we first add functionality 
to the BiCG algorithm, following closely what was done in the Hermitian case, that allows for 
computing few eigenvalues using only a limited size window of the BiCG residuals. In this case 
we'll need to compute left and right eigenvectors of A. The modified BiCG algorithm will be 
called eigBiCG. For multiple right-hand sides, we solve a subset of the systems using eigBiCG and 
accumulate more eigenvectors and, hopefully, improve their accuracy using an incremental scheme 
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as was done in the Hermitian case. For the remaining systems, we deflate the components of the 
computed eigenvectors and then use BiCGStab to solve them. Using BiCGStab instead of BiCG 
is motivated by the fact that BiCGStab normally converges faster than BiCG. We chose BiCG for 
computing eigenvectors because the BiCG residuals and parameters can easily be related to the 
Bi-Lanczos vectors and projection matrix. We also discuss simplifications when A satisfies the 
75-Hermiticity condition 75A = A^y$. 

In the following, the dot product of two vectors will be denoted by (w, v) := w^v, the Euclidean 
norm of a vector is denoted by ||v|| and the complex conjugate of a number z will be denoted by 
z. The function [zr,zl,D] = eig(C) returns the right eigenvectors zr, the left eigenvectors zl and the 
eigenvalues array D sorted according to a user chosen criteria. 



2. Incremental eigBiCG algorithm 

In the BiLanczos algorithm one solves the dual systems Ax = b and A^x = b. Given xq and xo 
initial guesses, the algorithm builds a bi-orthogonal basis for the Krylov subspaces, 

JT(A,vi) = {vi,Avi,A 2 V!,...} (2.1) 
X(A\w\) = {wlAVAi,-}, 

where v\ = ro/||ro||, ro = b— Axq, ?o = b — A^xq and w\ = ^/(r^fo), so that w\v\ = 1. Normally, 
b = b, x = x , and w x = v x . Let W m ) = {vi,v 2 , . . . ,v m }, ff w = {w u w 2 , ■ ■ ■ ,w m } be the bi- 
orthogonal bases of JfT, and j£, and = W^UyH the projection matrix. Let and z {m) 
be the right and left eigenvectors of H^ m \ The approximate right and left Ritz eigenvectors of A 
are given by = yH^H an d zM = W^ m h {m) respectively. In the BiCG algorithm, the bi- 
orthogonal bases V, W and the projection matrix H are not computed explicitly. The basis vectors 
are obtained as the right and left BiCG residuals, while the matrix H is obtained from the BiCG 
scalar coefficients. This can be done by noting that vj = Tijrj_\ and wj = Cy^y-i for j = 1,2, ... , 
with r\j and satisfying (wj,vj) = 1. In the following, we choose a normalization such that 
I \vj\ I = 1 and Tjy is real positive, however, other normalizations that maintain the biorthogonality of 
Wj and Vj could be used. From these relations, and the biorthogonality of the BiCG residuals, the 
elements of H could be recovered from the scalar coefficients of BiCG without extra matrix- vector 
products. 

Similar to eigCG, the eigBiCG algorithm adds functionality to the standard BiCG algorithm 
for computing few eigenvalues and eigenvectors of A using only a window of size m of the BiCG 
residuals. For nev requested eigenvalues with a chosen criterion (smallest absolute value, for ex- 
ample), the eigenvalue part of eigBiCG computes eigenvectors from the size m and size (m — 1) 
subspaces. The search subspaces V^ m \ and the projection matrix are then restarted 

with these 2nev computed eigenvectors which are inexpensively biorthogonalized in the coefficient 
space. The details of this part are given in Algorithm BiCG — eigen. Note that we need m > 2nev. 
The full eigBiCG algorithm for solving the linear system Ax = b and computing nev eigenvalues 
and eigenvectors using a search subspace of dimension m is given in Algorithm eigBiCG. In order 
to compute the elements of the 2nev + 1-th row and column of H after restarting we need Av2nev+\ 
and A t H'2, t ev+i which will be given from Ary_i and A^fj-\. This can be accomplished by using the 
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relations pj_\ = rj_\ + f5j_2Pj-2 an d pj-\ = fj-\ + Pj-2Pj-2- So, we need to store Ap and A^p 
products when vs = m — \, where vs is the current size of the search subspaces. 



Algorithm: BiCG - eigen{nev, m, V ^ ,W^ m \H^ ) 

. [yW^W^W] =eig(H^), |ym-i) ;Z (m-i) ; £(m-i)] = e ig{H^ m - v >). 

• Append a zero at the end of each of the vectors y( m ~ l ) , z ( m ~ l \ 

• [y,z] = Bi-orthogonalize [y[ m \^ m) , . . . y^^Ky^"'^, ■ ■ ■ against 

[4 m) ,4 m) ,...,zS; z { r l \z { r l \---,zt~\ Note that y(»), z W are already 
biorthogonal. Need only to extend biorthogonality to the rest of vectors. 

• T=$HWy, [u,q,A]=eig(T). 

• U = V^yu, Q = W^ m hq. 

• Restart: 

-yM = [], **») = [], vgL = £/, < : l v = e. 

- H^'f = 0; i,; = 1,2, . . . ,m, ff/" = A,- for i = 1,2, ..,2nev. 



Algorithm Incremental eigBICG 

Given initial guesses x^ for = 1,2,... , A^ r : 

1. Choose nev,m and set t// = [ ], U r = [ ], and #=[ ]■ 

2. Forfc= 1,2,. ..,m 

• If f/ r is not empty, set x k = x^ + U r d, where Hd — U^(b k -Axq). 

• solve the system using eigBiCG(nev,m,V,W). 

• Compute [V',W] = biorthogonalize [V,W] against [U r ,Ui\. 

u ( H uJav'\ 

• COmpUtetheneW//= l v W'tA^wWJ 

• Add the new vectors: U t = [t// W']wAU r =[U r V'}. 

3. FORfe = m + i,m + 2,...,Ar r 

• x k Q =x k + U r d, where Hd = Uj {b k - Ax k Q ). 

• Solve the system using BiCGStab. 

• Repeat the deflation and restart BiCGStab when the residual is less than 
DefTol * \ \b\\. 
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Algorithm: eigBiCG(nev, m, A, U, Q) 



1. Choose initial guess xq, compute ro = b — Axq, and set po = ro. 

2. Choose fo such that (ro, ro) =/= 0, and set po = Pq. Set po = (r , r ), vs = 0. 

3. For 7 = 1,2,... till convergence 

• Compute Apj-\ andA 1 "/?,-!. 

• Compute c 7 _i = (pj-i,Apj_i) and a,_i = ^=i. Setx 7 =Xj-\ + aj-ipk-i- 

• Ifvs = m — l, q= Apj-i, s = A* pj-\. 

• If vs = m 

- Compute eigenvalues and restart the search subspace and projection ma- 
trix using the algorithm BiCG — eigen(nev,m,V,W,H). Set vs = 2nev. 

- Compute the H k)2 „ev+\ and H 2 „ ev +i,k for A: = 1, 2, ... , 2nev. 

H 2 nev+l,k = ^-^(A^pj-i-fij^Vk. 
H k ,2nev+\ = T^ w l( A Pj-l-Pj-2l)- 

• vs = vs+l, v vs = 1 j-L- 1[ r 7 -_ 1 , w v , = J!jgi^_ 1 . 

• Compute r 7 = r 7 _i - cCj-\Apj_i and f 7 = r fc _i - aj-\A^pj-\. 

• Set p 7 = (r/,r 7 ) and compute j3 7 _i = -p-^. 

• Set P./ = r j + Pj- IP J- 1 and Pj = ?j + Pj-iPj- 1 • 

• Compute the diagonal H matrix elements: 

ifi-1 w — 1 plcp fj — _J l PJ~ 2 

n J — i, n VS:VS — aj l , cise n VS y S — ^ -|- a ^ . 

• If vs < m, compute the off-diagonal H matrix elements: 

H __tulP^ ± „ _ llo-ll i 

«vs,vs+l - || r .|| a ._j, '• 1 ' " Hr^ill a 7 _r 

• If | \ r j\ | < ^o/ * | \b\ | for a given tolerance tol, stop the iterations. 

4. Using V (™), and compute the final nev eigenvalues and eigenvectors: 

[j( vs ), z ( V5 ),A( vs )] = eig{H^), u( nev ) = yWj,M : _^(vj) z (vs)_ 



For multiple right-hand sides, we use the Incremental eigBiCG algorithm. After solving a sub- 
set of the right-hand sides and accumulating the deflation subspaces [// and U r , we use BiCGStab 
on the remaining systems after deflating the eigenvector components. Since computed eigenvec- 
tors are not exact we might need to repeat the deflation step and restart BiCGStab depending on 
the accuracy of the eigenvectors. The deflation restart tolerance is called DefTol. In addition, final 
eigenvectors computed from incremental eigBiCG could be computed, if necessary, using Raleigh- 
Ritz with Ui, and U r as search subspaces. 
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3. Results 

The algorithm is preliminary tested on two quenched Wilson lattice QCD matrices with even- 
odd preconditioning nearf^,.,^/. The first is a 8 4 lattice at /3 =5.5 with m q = —1.25 where 
K = 8+ i, m . This case will be labeled as QCD49K — eo since the Dirac matrix will be of size 49, 152 
before the even-odd precondtioning. The second is a 12 4 lattice at j8 = 5.8 with m q = —0.95. This 
case will be labeled as QCD2A9K — eo. We first compare the lowest eigenvalues computed with 
eigBiCG to those computed with un-restarted BiCG in which all the residuals were stored. As seen 
from Table[I], tne resu l ts from eigBiCG with a limited storage gives eigenvalues in close agreement 
with un-restarted BiCG where all the residuals were stored. We next study how incremental eig- 
BiCG could speed up the solution with many right-hand sides. In Figure |], we show the effect of 
deflation for different choices of nev and m after solving n\ right-hand sides, showing a speed up 
factor of about 2.5. In Figure Q we show the effect of reducing the quark mass on the number of 
iterations used by BiCGStab and compare it to the case when solving the normal equations using 
Incremental EigCG. The results show that Incremental eigBiCG is competitive with eigCG but 
not necessarily better. We note that both BiCGStab and CG applied to the normal equations use 
two matrix-vector products per iteration. A better comparison between the two methods requires 
experiments on larger lattice QCD matrices. 



Method 


Eigenvalues 


Residuals 


eigBiCG 


3.46577e-03-1.07644e-13i 
1 . 35450e-02+ 1 .72604e-02i 
1.35450e-02-1.72604e-02i 
2.82870e-02+1.09765e-07i 
1.51950e-02-2.26792e-02i 


1.71e-07 
1.69e-06 
1.37e-06 
2.40e-03 
9.35e-01 


BiCG 


3.46577e-03+7.84451e-15i 
1 . 35450e-02+ 1 .72604e-02i 
1.35450e-02-1.72604e-02i 
2.82870e-02+1.09755e-07i 
1.36523e-02+4.16515e-02i 


1.43e-08 
1.72e-06 
1.40e-06 
2.39e-03 
1.52e-06 



Table 1: Comparing lowest 5 eigenvalues for QCD249K-eo obtained with un-restarted BiCG and with 
eigBiCG using nev =15 and m = 40. The tolerance for the linear system was chosen to be le — 08 and the 
system converged in 592 iterations 



4. y 5 -Hermitian systems 

For Wilson and Clover fermions we have the symmetry 

y 5 A=A^y 5 . (4.1) 

Using this symmetry we can replace the costly matrix-vector multiplication with A' in eigBiCG 
with the cheaper multiplication with 75. In BiCG, if we chose tq = JsTq then it follows that fj = y$rj 
and the search directions pj = y$pj for subsequent iterations. Also, eigenvalues will be real or come 
in pairs of conjugate values, and left eigenvectors are computable from right ones, as long as we 
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QCD 49K, even/odd, tol=1 e-8 QCD 249K, even/odd, tol=1 e-8 
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Figure 1: Deflated BiCGStab using eigenvectors computed with eigBiCG for nl right-hand sides for differ- 
ent choices of nev and m. No restarting was needed (DefTol = was used). 
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Figure 2: Removing critical slowing down using eigBiCG and eigCG applied to the normal equations. 



keep eigenvectors corresponding to conjugate pairs. Using these relations, we can simplify the 
first phase of Incremental eigBiCG where eigenvalues are computed. For illustration, we show 
preliminary results comparing the two versions of the algorithm in Figure ||. The result shows a 
similar performance In which only the right eigenvectors need to be stored and where matrix- vector 
multiplication with A* is avoided. 

5. Conclusions 

Extending the ideas behind the successful eigCG algorithm to non-Hermitian systems gave 
very promising results. The new algorithm gave access to the left and right eigenvectors of the 
Dirac matrix while solving the linear systems using only a limited storage. It was also shown to 
remove the critical slowing down and to be competitive with eigCG. For 75-Hermitian systems, 
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QCD49K-eo, nev=20, m=60, n1=10 
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Figure 3: Comparison of deflation with 75-Hermitian algorithm. 



preliminary study shows that storage of the left eigenvectors and multiplication with A 1 " could be 
avoided. 
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